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, Euclidean norm calculations arise frequently in scientific and engineering applications. Several approximations for 
this norm with differing complexity and accuracy have been proposed in the literature. Earlier approaches [H, 0, IH 
C/3 , were based on minimizing the maximum error. Recently, Seol and Cheun [3] proposed an approximation based on 
minimizing the average error. In this paper, we first examine these approximations in detail, show that they fit 
into a single mathematical formulation, and compare their average and maximum errors. We then show that the 
maximum errors given by Seol and Cheun are significantly optimistic. 
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00 , 1. Introduction 

The Minkowski (L p ) metric is inarguably one of the most commonly used quantitative distance (dissimi- 
larity) measures in scientific and engineering applications. The Minkowski distance between two vectors x = 



00 

o 



- 1—^ 

X 



' (x\, X2, ■ ■ ■ , x n ) and y = (j/i, yi, ■ ■ ■ , y n ) in the n-dimensional Euclidean space, R n , is given by 

i/p 



L p (x,y) = (22 n i=1 \xi-Vi\ P ) ■ (1) 



Three special cases of the L p metric are of particular interest, namely, L\ (city-block metric), L 2 (Euclidean metric), 
and Loo (chessboard metric). Given the general form ([1}, L\ and L 2 can be defined in a straightforward fashion, 
while Lqo is defined as 

Loo(x,y)= max \xi - y t \. 

l<i<n 

In many applications, the data space is Euclidean and therefore the L 2 metric is the natural choice. In addition, 
this metric has the advantage of being isotropic (rotation invariant). For example, when the input vectors stem 
from an isotropic vector field, e.g. a velocity field, the most appropriate choice is to use the L 2 metric so that all 
vectors are processed in the same way, regardless of their orientation 

The main drawback of L 2 is its high computational requirements due to the multiplications and the square root 
operation. As a result, L\ and Loo are often used as alternatives. Although these metrics are computationally 
more efficient, they deviate from L2 significantly. The Minkowski metric is translation invariant, i.e. L p (x, y) = 
L p (x + z, y + z) for all x, y, z £ R n , hence it suffices to consider L> p (x) = L p (x, 0), i.e. the distance from the point 
x to the origin. Therefore, in the rest of the paper, we will consider approximations to L) p (x) rather than L p (x, y). 

In this paper, we examine several approximations to the Euclidean norm. The rest of the paper is organized 
as follows. In Section [2] we describe the Euclidean norm approximations that have appeared in the literature, and 
compare their average and maximum errors using numerical simulations. We then show that all of these methods 
fit into a single mathematical formulation. In Section [3] we examine the simulation results from a theoretical 
perspective. Finally, in Section @] we provide our conclusions. 
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2. Euclidean Norm Approximations 



For reasons explained in Sec. [TJ we concentrate on approximations to the Euclidean norm D2 on R™. Let D, 
defined on R n , be an approximation to Di- We assume that D is a continuous homogeneous function. We note 
that all variants of D we consider in this paper satisfy these assumptions. As a measure of the quality of the 
approximation of D to D 2 we define the maximum relative error (MRE) as 

B \D(x)-D 2 (x)\ 

£ max — bu P n / \ ' l z 7 

x€M"\{0} U 2\X) 

Using the homogeneity of D2 and D, © can be written as 

e° 3X = sup \D(x)-l\, (3) 

where 

5 2 n - 1 = {xeM n :5 2 (x)=l} 

is the unit hypersphere of R™ with respect to the Euclidean norm. Furthermore, by the continuity of D, we can 
replace the supremum with maximum in ([3]) and write 

e£ ax = max |£>(x)-l|. (4) 

We will use @ as the definition of MRE throughout. 

In the trivial case where D = D2 we have e^ax = 0- Hence, for nontrivial cases we wish to have a small e^ax 
value. In other words, the smaller the value of £jj ax , the better (more accurate) the corresponding approximation D. 
It can be shown that Di (city-block norm) overestimates D2 and the corresponding MRE is given by e^ ax = y/n — 1 
d) . In contrast, D x (chessboard norm) underestimates D2 with MRE given by e^Sx — 1 — V V™ Q ■ More explicitly, 

D 2 (x) < Di(x) < V^D 2 (x) 
(l/v^)-D2(x)<Doo(x)<£>2(x) 

for all x G R". Therefore, it is natural to expect a suitable linear combination of D\ and £>oo to give an approxi- 
mation to D2 better than both D\ and [2|. 

2.1. Chaudhuri et al. 's approximation 

Chaudhuri et al. [l[ proposed the approximation 

" 1 
D\(x) — |x imax | + A ) \Xi\, with A = 



i=i 



Here i max is the index of the absolute largest component of x, i.e. ? max = argmax and [x\ is the floor function 

l<i<n 

which returns the largest integer less than or equal to x. Since D^lx) = |a^ max |, by adding and subtracting the 
term A|x£ max |, D\ can be written as a linear combination of Doo and D\ as 

Dx(x) = (l-A)Doo(x)+ADi(x). (6) 

It is easy to see that Doo(x) < £)>(x) < D\(x) for all x G R" since < A < 0.5. It can also be shown that lj 
for sufficiently large n, D\ is closer to D2 than both D\ and -Doo, i.e. \D\(x) — £>2(x)| < \D\(x) — £>2(x)| and 
\D x {x) - D 2 (x)\ < \D 2 (x) - Doo(x)| for all x G R". 

For sufficiently large n, D\ underestimates D2 and the corresponding MRE is given by 

l- 1 -^"- 1 ) < ^ < 1-A for n>3. 



1 Unfortunately, the motivation behind this particular choice of A is not given in the paper. 
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Otherwise, D\ overestimates D2 and we have 



max 



i + fep- 1 for n = 2, 4, 6,... 



1 + ^^-1 forn = 3,5,7,.. 



Proofs of these identities can be found in [l[ . 



2.2. Rhodes' approximations 

Rhodes reformulated (jSJ) as a maximum of linear functions 

D x (x) = max |(1 - X)\xj \ + \J2 M j 

where < A < 1 is taken as a free parameter. He determined the optimal value for A by minimizing e^Jax = 
max xeS n-i |_Da(x) — 1| analytically. In particular, he showed that optimal A' values for Chaudhuri et a/.'s norm 

can be determined by solving the equation 1 — 2-^/A' — (A') 2 = \Jl + (A') 2 (n — 1) — 1 in the interval (0, 1/2). This 
equation is a quartic (fourth order) in A and can be solved using Ferrari's method It can be shown that this 
particular quartic equation has two real and two complex roots and the optimal A' value is given by the smaller of 
the real roots. The corresponding MRE is given by 



= 1 - 2v/A'-(A') 2 . (7) 

In the remainder of this paper, D\ refers to this improved variant of Chaudhuri et a/.'s norm. 
Rhodes also investigated the two-parameter family of approximations given by 

£>„ )A (x) = (/i - X)D OD (x) + XDi (x) (8) 

where < A < /i. He proved that the optimal solution and its MRE in this case are given by 

A* = ; 2 n* = (V^+1)A*, 

2nV4 + y j2n + 2 s /n~ (9) 

e£&* = 1 - 2AV/4. 

Finally, Rhodes investigated the D^x approximations with < /x < A. He proved that the optimal solution and 
its MRE are given by 

A* = - 2 ==, /i* = 0, £m& A =l-A*. 

1 + V?i - 1 

This approximation will not be considered any further since its accuracy is inferior to even the single-parameter 
approximation D\. 

It should be noted that Rhodes optimized D\ and over Z™. Therefore, these norms are in fact suboptimal 
on W 1 (see 32~5l). 



2.3. Barni et al. 's approximation 

Barni et al. 0, @] formulated a generic approximation for D2 as 

n 

-D B (x) = 8 aiX(i) 

i=l 

where x^ is the z-th absolute largest component of x, i.e. (xh\, X( 2 ), ■ • • , ^( n )) is & permutation of (\xi |, |x 2 |, • ■ ■ , |a; n |) 
such that > X( 2 ) > . . . > x^ n y Here a = (ai, 012, ■ ■ ■ ,a n ) and S > are approximation parameters. Note that 
a non-increasing ordering and strict positivity of the component weights, i.e. a± > 0:2 > ■ ■ ■ > a n > is a, necessary 
and sufficient condition for Db to define a norm 

The minimization of (j4]) is equivalent to determining the weight vector a and the scale factor 5 that solve the 



following minimax problem 



is cq 

m 



minmax \Db(x) — 1| (10) 

at, 5 x G V" 



3 



where V = {x e W 1 : x x > x 2 > ■ ■ ■ > x n > 0, D 2 (x) = 1}. 
The optimal solution and its MRE are given by 

at = V~i-Vi=l, S* = 2 e°^ = l-5*. (11) 

It should be noted that a similar but less rigorous approach had been published earlier by Ohashi Q . 

2.4- Seol and Cheun's approximation 

Seol and Cheun Q recently proposed an approximation of the form 

£> o ,6(x)=aD O0 (x) + 6Di(x) (12) 

where a and b are strictly positive parameters to be determined by solving the following 2x2 linear system 

aE(^) +bE(D 00 D 1 )=E(D 2 D 00 ), 
aE(D oc D 1 ) + bE(D 2 1 ) =E(D 2 D 1 ), 

where E(-) is the expectation operator. 

Note that the formulation of D a j, is similar to that of D^ : \ © in that they both approximate D 2 by a linear 
combination of -Dqo and D±. These approximations differ in their methodologies for finding the optimal parameters. 
Rhodes follows an analytical approach and derives theoretical values for the parameters and the maximum error. 
However, he achieves this by sacrificing maximization over R n , and maximizes only over Z™. Seol and Cheun follow 
an empirical approach where they approximate optimal parameters over R™, which causes them to sacrifice the 
ability to obtain analytical values for the parameters and the maximum error. They estimate the optimal values 
of a and b using 100, 000 n-dimcnsional vectors whose components are independent and identically distributed, 
standard Gaussian random variables. 

2. 5. Comparison of the Euclidean norm approximations 

It is easy to see that all of the presented approximations fit into the general form 

n 

•^( x ) = ^2 w * x (i) 

which is a weighted D± norm. 

The component weights for each approximation are given in Table [1] It can be seen that Db has the most 
elaborate design in which each component is assigned a weight proportional to its ranking. However, this weighting 
scheme also presents a drawback in that a full ordering of the component absolute values is required (see Table [5]). 



Tabic 1: Weights for the approximate norms 



Norm 


Wl 




D x 


1 


X' 


D^.x 




X* 


D B 


S* 


5*a* 


D a ,b 


a + b 


b 



Due to their formulations, the MRE's for Dx, -D^,a, and Db can be calculated analytically using ([7]), ©, and 
(TTTj) , respectively. In Figure [1] we plot the theoretical errors for these norms for n < 100. It can be seen that Db is 
not only more accurate than Dx and £^,a, but it also scales significantly better. Although D^\ is more accurate 
than Dx when n is small, the difference between the two approximations becomes less significant as n is increased. 

The operation counts for each norm are given in Table [2] (ABS: absolute value, COMP: comparison, ADD: 
addition, MULT: multiplication, SQRT: square root). The following conclusions can be drawn: 

> Db has the highest computational cost among the approximate norms due to its costly weighting scheme, 
which requires sorting of n numbers and n multiplications. For small values of n, sorting can be performed 
most efficiently by a sorting network For large values of n, sorting requires O(nlogn) comparisons, which 
is likely to exceed the cost of the square root operation Q. Therefore, in high dimensional spaces, e.g. n > 9 
Db provides no computational advantage over D 2 - 
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Figure 1: Maximum relative errors for D\, £)„ \, and Db 
Table 2: Operation counts for the norms 



Norm 


ABS 


COMP 


ADD 


MULT 


SQRT 




n 


n - 1 











Di 


n 





n-l 








D 2 








n- 1 


n 


1 


D x 


n 


n- 1 


n - 1 


1 







n 


n-l 


n 


2 





D B 


n 


O(nlogri) 


n-l 


n 





D a ,b 


n 


n- 1 


n 


2 






> D\ has the lowest computational cost among the approximate norms. D^ x and D a b have the same compu- 
tational cost, which is slightly higher than that of D\. 

> A significant advantage of Dx, D^\, and D a \, is that they require a fixed number of multiplications (1 or 2) 
regardless of the value of n. 

> Dx, D^ x, and D a j, can be used to approximate D\ (squared Euclidean norm) using an extra multiplication. 
On the other hand, the computational cost of Db is higher than that of D\ due to the extra absolute value 
and sorting operations involved. 

In Tablc[3]we display the average and maximum errors for Dx, D^\, and Db for n < 10. Average relative error 
(ARE) is defined as 



-avg 



\S\ ^ 

1 1 xes 



|D(x)-l| 



(13) 



where S is a finite subset of the unit hypersphere , and 151 denotes the number of elements in S. An efficient 
way to pick a random point on S^ -1 is to generate n independent Gaussian random variables x%, x%, . . . , x„ with 
zero mean and unit variance. The distribution of the unit vectors 



{y = (yi,V2, ■ ■ ■ ,y n ) ■ yi = x il{^}2 i=l 



1/2 



l,2,...,n} 
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Tabic 3: Average and maximum errors for D^, D„ ^, and Dg 









D B 


71 




lVlix±!j e 




A n r. 


JVLJtvx!j e 


IVlrui^ 




Mlv±!j e 






n no/I o 
U.Uo4o 


U.UOOl 


U.UOOl 


U.Uz / o 


U.U4/ U 


n H/1 7H 

U.U4/ U 


U.Uz41 


n none 

u.Uoyo 


U.Uoyo 


Q 
O 


U.U4ol 


U.UoOZ 


U.UoOZ 


U.UoO i 


U.U I/O 


0778 
U.U iio 


U.UoUU 


U.UUUZ 


u.uouz 




0.0455 


0.1074 


0.1074 


0.0420 


0.1010 


0.1010 


0.0345 


0.0739 


0.0739 


5 


0.0460 


0.1251 


0.1251 


0.0447 


0.1197 


0.1197 


0.0377 


0.0839 


0.0839 


6 


0.0458 


0.1400 


0.1400 


0.0462 


0.1354 


0.1354 


0.0401 


0.0919 


0.0919 


7 


0.0454 


0.1529 


0.1529 


0.0469 


0.1489 


0.1490 


0.0418 


0.0984 


0.0984 


8 


0.0448 


0.1641 


0.1643 


0.0471 


0.1606 


0.1609 


0.0431 


0.1039 


0.1039 


9 


0.0442 


0.1739 


0.1745 


0.0471 


0.1709 


0.1716 


0.0440 


0.1086 


0.1086 


10 


0.0435 


0.1827 


0.1837 


0.0469 


0.1803 


0.1812 


0.0447 


0.1128 


0.1128 



Table 4: Average and maximum errors for D a (, 





Seol & Cheun 


This study 


n 


ARE 


MRE e 


ARE 


MRE e 


2 


0.0200 


0.0526 


0.0200 


0.0525 


3 


0.0239 


0.0991 


0.0239 


0.0998 


4 


0.0257 


0.1342 


0.0257 


0.1363 


5 


0.0268 


0.1420 


0.0268 


0.1649 


6 


0.0273 


0.1674 


0.0273 


0.1871 


7 


0.0276 


0.1772 


0.0276 


0.1968 


8 


0.0277 


0.1753 


0.0277 


0.2076 


9 


0.0277 


0.1711 


0.0277 


0.2120 


10 


0.0276 


0.1526 


0.0276 


0.2156 



will then be uniform over the surface of the hypcrsphcre [10| . For each approximate norm, the ARE and MRE 
values were calculated over an increasing number of points, 2 20 , 2 21 , . . . , 2 32 — 1 (that arc uniformly distributed 
on the hypcrsphcre) until the error values converge, i.e. the error values do not differ by more than e = 10 -5 
in two consecutive iterations. Note that for each norm, two types of maximum error were considered: empirical 
maximum error (MRE e ), which is calculated numerically over S and the theoretical maximum error (MRE t ), which 
is calculated analytically using ([7]), ([9]), or (jlip . It can be seen that for Db, the empirical and maximum errors 
agree in all cases, which demonstrates the validity of the presented iterative error calculation scheme. This is not 
the case for D\ and -D Mi a since these norms are optimized over Z™ instead of W 1 . Therefore, a perfect agreement 
between the empirical and theoretical results should not be expected. Nevertheless, the empirical error is always 
less than the maximum error, which is expected because wc arc maximizing over a smaller set. 

Table|3]shows the average and maximum errors for D a b. The error values under the column "Scol & Cheun" are 
taken from Q (where the simulations were performed on a set of 100, 000 n-dimensional vectors whose components 
are independent and identically distributed, zero mean, and unit variance Gaussian random variables), whereas 
those under the column "This study" were obtained using the aforementioned iterative scheme. It can be seen 
that the maximum errors obtained by Scol & Cheun are lower than those that we obtained and the discrepancy 
between the outcomes of the two error calculation schemes increases as n is increased. The optimistic maximum 
error values given by Seol and Cheun are due to the fact that 100, 000 vectors are not enough to cover the surface 
of the hypersphere in higher dimensions. This is investigated further in the following section. On the other hand, 
the average error values agree perfectly in both calculation schemes. 

By examining Tables [3] and |4] the following observations can be made regarding the maximum error: 

> Db is the most accurate approximation in all cases. This is because this norm is designed to minimize the 
maximum error and it has a more sophisticated weighting scheme than the other two approximations, i.e. D\ 
and -D^a, that are based on the same optimality criterion. 

> As is also evident from Figure [IJ -D m ,a is slightly more accurate than D\ especially for small values of n, in 
accordance with the greater degrees of freedom it is afforded. 
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> D ao is the least accurate approximation except for n = 2. This was expected since this norm is designed to 
minimize the mean squared error rather than the maximum error. 

> As n is increased, the error increases in all approximations. However, as can be seen from Figure [TJ the error 
grows faster in some approximations than others. 

On the other hand, with respect to average error we can see that: 

> As expected, -D a ,b is the most accurate approximation. 

> As n is increased, the error increases consistently for the Db norm. This is not the case for the D\ and D^\ 
norms. This inconsistent average error behavior is not surprising given the fact that these norms are designed 
to minimize the maximum error. 

> Interestingly, D\ is more accurate than D^\ for n > 5. A possible explanation to this phenomenon is that 
both approximations are optimized for the maximum error. Since the minimization of the maximum and 
average errors are conflicting objectives, it is likely that sacrifices the average error to obtain better 
(lower) maximum error. The same relationship holds between D ao and Db- 



3. Sampling on the Unit Hypersphere 

In this section, we demonstrate why a fixed number of samples from the unit hypersphere (i.e. the approach 
advocated in j^]) can give biased estimates for the maximum error. The basic reason behind this is the fact that a 
fixed number of samples fail to suffice as the dimension of the space increases. The following provides a plausibility 
argument as to why this is the case. To this end, we need to consider the notion of covering a sphere 'sufficiently'. 
We begin with some definitions. 

A closed n-ball of radius r with respect to the Euclidean norm, denoted Bl% (r), is the set of points whose 
Euclidean norm is less than or equal to r. That is, 

B 2 0) = {x e M" : D 2 (x) < r}. 

Note that, in particular, the unit hypersphere ST? _1 of R n is the boundary of B^ (1). 

Given an e > 0, we say that a set C of points on S% _1 is an e-dense covering of S2 _1 if for 

any x in C, there 

exists at least one x in C (different than x) such that Z?2(x — x) < e. Essentially, our main purpose here is to give 
a rough estimate of the number of points in C, where C is an e-dense covering of S£ _1 . We would then argue that, 
if e is sufficiently small then C is a fine-enough representation of points on . Therefore, we can restrict any 
computation that needs to be performed on S^ 1 to the finite set C. 

The basic idea behind the proof is to approximate S% 1 by B"^ _1 (e), that is, approximate the unit hypersphere 
of K™ by (n — l)-balls of radius e. This is the same principle as approximating a circle in M. 2 by tiny line 
segments (B\(e)), or the surface of a sphere (Sf) in R 3 by tiny discs (B 2 (e)). It is easy to see that, if we choose e 
small enough, then the approximation is satisfactory for most practical purposes. 

To proceed further, we need a lemma from elementary probability theory, which is known as the coupon collector's 



problem 11 1 



Lemma 1. Given a collection of c distinct objects, the expected number of independent random trials needed to 
sample each one of the c objects is O(clogc). 

We can now prove the following result. 

Theorem 1. The expected number of uniformly distributed samples needed to generate an e-dense covering o/S'j -1 
is 0(N log TV) where N = 

Proof. Let e > be given. We will first count the number of identical copies of B£ -1 (e)-balls that are needed to 
approximate S^ -1 in the sense described above. By elementary calculus one can compute the volume of an B 1 ^ (r) 
to be 

VJr) := / 2 . r n := C n r n , 
n\ ) r(f + 1) 

where T is the gamma function. The surface area of this ball is equal to the derivative with respect to r of its 
volume: 

A n -i{r) = ^-V n {r)=nC n r n -\ 
ar 
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Note that A„_i(l) is equal to the surface area of S 1 ^ • The approximate number of 1 (e)-balls needed to cover 
the surface of S^ 1 1S the ratio of the surface area of S^ -1 to the volume of i?2 _1 (e), i-e., 

A„_i(l) nC n n7r?r(^- + l) 1 



V n -i(e) Cn^e™- 1 T(f + 1) 

i 

n 



O 



The result now follows once we apply Lemma [T] with c = g „ ? !_ 1 . □ 

In the light of the following result, we see that the actual number of samples required docs not deviate significantly 
from the value provided by Theorem [T] 

Theorem 2. Let X be the number of samples observed before obtaining one in each region. Then, for any constant 
s > we have 

P(X > c\nc + sc) < e~ s . 
Proof. The probability of not obtaining the ith region after chic + sc steps is 

1 \ c(ln c+s) -, 
1--] <«>-(*<*-) = J- . 



c / e°c 



By a union bound, the probability that a region has not been obtained after c In c + sc steps is only e s . □ 
Note that one can use a Chcrnoff bound to obtain an even tighter bound in Theorem [2] since 

lim P(X > c In c + sc) = 1 - e~ e ~'' . 

c— >oo 

See [nj for details. 

We should note that in order to apply Lemma [T] the patches used to cover S%~ should be disjoint which is 
clearly not the case since we have used B^ -1 ^) for this purpose. This leads to an overestimate of the samples 
needed to obtain a dense covering, and thus the argument presented in this section is only a rough estimate. 
However, as empirically demonstrated in the previous section, a fixed number of samples as in Q is definitely not 
sufficient either. To come up with a tight estimate of the number of sample points needed, one has to express S% ~ l 
as a disjoint union of small patches. The delicacy lies in the requirement that this has to be achieved through a 
constructive process in a way that the surface area of each patch can be explicitly computed as a function of the 
dimension n, and a characteristic measure e. To the best of the authors' knowledge there is no systematic method 
in the literature to achieve this. 



4. Conclusions 

In this paper, we investigated the theoretical and practical aspects of several Euclidean norm approximations 
in the literature and showed that these are in fact special cases of the weighted city-block norm. We evaluated 
the average and maximum errors of these norms using numerical simulations. Finally, we demonstrated that the 
maximum errors given in a recent study [4[ are significantly optimistic. 

The implementations of the approximate norms described in this paper will be made publicly available at 
|http : //www . lsus . edu/f aculty/~ecelebi/research .htm| 
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